Calculating error bars for neutrino mixing parameters 
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One goal of contemporary particle physics is to determine the mixing angles and mass-squared 
differences that constitute the phenomenological constants that describe neutrino oscillations. Of 
great interest are not only the best fit values of these constants but also their errors. Some of the 
neutrino oscillation data is statistically poor and cannot be treated by normal (Gaussian) statistics. 
To extract confidence intervals when the statistics are not normal, one should not utilize the value 
for Ax^ versus confidence level taken from normal statistics. Instead, we propose that one should use 
the normalized likelihood function as a probability distribution; the relationship between the correct 
Ax^ and a given confidence level can be computed by integrating over the likelihood function. This 
allows for a definition of confidence level independent of the functional form of the function; it 
is particularly useful for cases in which the minimum of the x^ function is near a boundary. We 
present two pedagogic examples and find that the proposed method yields confidence intervals that 
can differ significantly from those obtained by using the value of Ax^ from normal statistics. For 
example, we find that for the first data release of the T2K experiment the probability that ^13 is not 
zero, as defined by the maximum confidence level at which the value of zero is not allowed, is 92%. 
Using the value of Ax^ at zero and assigning a confidence level from normal statistics, a common 
practice, gives the over estimation of 99.5%. 
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Neutrino oscillations is the unique experimentally ob- 
served phenomenon that goes beyond the standard model 
of the electroweak interaction. Assuming the observa- 
tions can be understood within the context of three neu- 
trino flavors, a coherent picture of the global data is 
sought in terms of two mass-squared differences, three 
mixing angles, and one CP phase. In order to extract 
these parameters from the data, a model of each experi- 
ment is developed. The model results for the experiment 
are then compared to the data through a choice of a par- 
ticular statistic, often expressed as a function. For 
a sufficiently large data set, normal (Gaussian) statistics 
can be assumed, and the function is defined as: 
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where {aj} is a set of parameters, the mixing angles and 
mass-squared differences, to be determined; {ck} is a 
set of systematic errors; ri^^^ are the experimental data 
points; nf^{{aj}, {ck}) are the theoretical predictions of 
the data; ai are the statistical errors for the data points; 
c)^ are the best estimates of the systematic errors; and 
CTfc the errors for the systematics. The systematic error 
parameters are usually treated as nuisance parameters 
and X^iWj}) is minimized with respect to these param- 
eters, often using the pull method for each set of the 
parameters {a^}. The best fit parameters are then the 
values of the aj which minimize ({% })• 

Neutrino oscillations require that we must also deal 



with small statistical samples. In particular, the recent 
T2K results 2] report a total of six observed neutrino 
events, binned by energy into sets containing zero, one, 
or two counts each. Despite this paucity, the data is a 
significant indicator that Ois is non-zero. The Super-K 
atmospheric data afford another example. Though it pro- 
vides relatively stringent bounds upon the mixing angle 
623 and the "atmospheric" mass-squared difference, the 
data also impact the determination of 6*13 . The Super-K 
experiment provides an upper bound for the angle and 
shows a slight preference for negative values of 6*13 
The sensitivity of the data to 6*13 can be traced to sub- 
GeV neutrinos with very long baselines (HI and the MSW 
resonances that occur for normal hierarchy in the 3 to 
7 GeV range The statistical significance of the data 
in these two regions is low and the resulting x^ is not 
well represented by a quadratic so that the assumption 
of Gaussian statistics is tenuous. 

For small sample sizes, it is standard usage to employ 
a function defined in terms of Poisson statistics, 

X'(K}) =: 5]2(nf({a,},K}) + 6,-nrP) 
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where bi is a theoretical estimate of background events. 
The best fit parameters remain the values of the aj at the 
minimum value of X^({'^j})- addition to being valid 
for small sample sizes, this x^ allows for the treatment of 
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the situation where it is not possible to cleanly separate 
the signal from the background. Background estimates 
are usually assessed through Monte Carlo simulations of 
the experimental detection and then inserted into Eq. ^ . 
For large sample sizes, the Poisson limits to the normal 
statistic x^i thus allowing its use for data where some 
bins have good statistics but some have poor statistics, 
as is the case for atmospheric data. 

Herein, we address the question as to how one should 
extract the errors on these parameters at a given confi- 
dence level. A common practice is to use the value of 
—'■ ~ Xmin t'^s.t corresponds to the desired con- 
fidence level as found from normal statistics, and then 
define the allowed region for the parameter a as lying 
within the interval [uo — 6i,ao + S2] where (oo ± <5i,2) — 
Xmin + with Qo corresponding to the best fit. For 
example, in a review on ^13 phenomenology @, the au- 
thors quote the 90% CL for sin^ ^13 computed by sev- 
eral groups. As this mixing angle is small and the 
parametrization of the mixing angle is strictly positive, 
it is near zero, the boundary of the parameter space. By 
observation, it is apparent that the for this parameter 
is manifestly not a quadratic and thus does not corre- 
spond to normal statistics. The authors state that their 
quoted 90% confidence levels on sin^ ^13 is found using 
the value Ax^ = 2.71, but for the reasons cited above, 
caution must be employed in using this value. Indeed, 
the authors of Ref. [6] admonish us that "the results on 
9i3 . . . should be taken with some grain of salt, and in 
particular the numbers given for various confidence lev- 
els ... have to be considered only as approximate, and 
should always be understood in terms of the Ax^ value." 

We propose a method for extracting allowed regions for 
a single parameter at a given confidence level that does 
not depend on the use of normal statistics. Instead, we 
take a Bayesian approach and interpret the normalized 
likelihood function with a fiat prior as a probability dis- 
tribution function. The likelihood function, C is defined 
in terms of the x^ function by 

X^i{a,})=:-2\ogCi{a,}). (3) 

For a single parameter a, normal statistics give x^ — 
(a — ao)'^/(T^ and C — exp(— (a — ao)'^/2a^), where a is 
the one standard deviation error for a. For a compact 
parameter space, as with the mixing angles, one can as- 
suredly normalize C; for the mass-squared differences, the 
likelihood function falls off rapidly enough so that nor- 
malization is possible for these parameters as well. We 
will hereafter work with a normalized likelihood function. 

We begin with a brief summary of marginalization, as 
this leads directly to our proposal for determining error 
bars. Generally, the x^ function and the maximum likeli- 
hood function are a function of n parameters, {ai}. Here 
these are the two mass-squared differences and the three 
mixing angles. Suppose we wish to extract information 
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FIG. 2: Ax^ versus sin^ 2613 for the T2K first data release 
2] as taken from the analysis in Ref. The curve depicted 
is calculated for positive ^13 and normal hierarchy. 

about one particular parameter, say ai, in light of the 
knowledge of the remaining n — 1 parameters. Marginal- 
ization tells us how to do so 

C{ai) = J da2da3 . . .da„/:({ai}) . (4) 

This follows simply because the normalized likelihood 
function is a probability distribution function; hence, 
C{ai) is also a probability distribution function. 

Dropping the subscript 1 for simplicity, we note that 
the probability V that the parameter a lies between amin 
and flinax is 

7'(amin,ai„ax) = / C{a) da . (5) 

We choose two pedagogic examples to demonstrate our 
results. 

For Example 1, we consider the extraction of with 
— 7r/2 < 013 < 7r/2 from the global analysis in Ref. 
[Note: This analysis does not contain the recent data 
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FIG. 3: [color online] The error bars as a function of confi- 
dence level for the Ax^ from Ref. |3i| as depicted in Fig. \T\ 
The solid straight (blue) horizontal line is the minimum value 
of ^13, the dashed (red) line is the upper end of the upper 
error bar while the dot-dash (green) curve is the lower end of 
the lower error bar. 



from Super-K III [8|, T2K j^, MINOS neutrino disap- 
pearance [§], anti-neutrino dissappearance or neu- 
trino appearance 11 1, Double Chooz [13], or Daya Bay 
[isj experiments and is used here purely for illustrative 
purposes.] In Fig. [H we plot Ax^ versus ^13; note that 
Ax^ is clearly not a quadratic function. In Example 2, 
we consider the extraction sin^ 2 ^13 from an analysis f?\ 
of the recent T2K data The T2K results are depen- 
dent on the hierarchy and the sign of 6*13; we show the 
results for normal hierarchy and positive ^13. In Fig. [5J 
we show Ax^ versus sin^ 2 ^13. Note that not only is Ax^ 
not quadratic, but the minimum is near the lower bound 
of zero for sin^ 2 6*13. 

A simple application of Eq. [S] would be to ask what 
is the probability calculated from Fig. [T] that ^13 is less 
than zero. The result is 80%. Similarly for Fig.[2]we can 
find that there is a 90% probability that sin^ 2 6*13 < 0.17. 

To define a confidence level for the parameter a, we 
choose a value for Ax^, find the two points ± 61^2 that 
correspond to the chosen Ax^, and integrate the likeli- 
hood function C{a) from Gq — Si to ao + S2- The integral 
yields the confidence level associated with the particu- 
lar value of Ax^. If you desire a particular confidence 
level, pick an initial guess for Ax^, such as the value 
from normal statistics, calculate the actual confidence 
level for this value and then repeat the process until you 
find the appropriate Ax^ that produces the desired con- 
fidence level. The process is not computationally difficult 
nor computationally intensive. Note that the concept of 
a standard deviation applies only to normal statistics, 
while confidence level is universal. 

For our two examples, we plot in Figs. [3] and |4] the error 
bars on 9i3 and sin^ 2^13, respectively, as they vary with 
the confidence level. Notice the errors are asymmetric in 
both cases. In Fig. SI we see that the lower error bar for 
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FIG. 4: [color online] The error bars as a function of confi- 
dence level for the Ax^ for T2K 0] as depicted in Fig.[l The 
curves are the same as in Fig. (3] 



sin 2 6113 extends to zero and then remains there as the 
confidence level increases. This demonstrates the point 
that, if the the best fit parameter is near a boundary 
of the parameter space, the confidence level will not be 
well approximated by the normal statistic, as Ax^ is not 
quadratic. 

In Figs. [5] and [6l we examine the relationship between 
Ax^ and the confidence level for our two examples, com- 
paring our results with those from normal statistics. In 
both figures, the [red] dashed curves utilize the normal- 
ized likelihood function, while the [blue] solid curves em- 
ploy normal statistics. In Table U we present the same 
information for some commonly used confidence levels. 
We see that at low confidence levels there is a large dif- 
ference between either example and the normal statistics 
result. For example, from Table |T] we see that for Exam- 
ple 1 the 68% confidence level corresponds to a Ax^ that 
is a factor of 1.7 larger than the normal statistics value of 
1.00, and for Example 2 the Ax^ is a factor of 0.7 lower 
than the normal statistics. For Example 1, we can un- 
derstand why the correct Ax^ is larger than the normal 
statistics values up to the 99% confidence level. This is 
because the Ax^ curve in Fig. [1] is more pointed than a 
quadratic, and it thus takes a higher value of Ax^ to get 
a given percentage below that value. Also for Example 
1, the correct and the normal statistics value are nearly 
equal at a confidence level of 99%, but this is accidental 
as the two confidence level curves intersect at a single 
point in this region. For Example 2, we see that the Ax^ 
is always below the normal statistics value. This feature 
will continue upward as the lower bound gets stuck at 
zero, and only the upper bound contributes above the 
chosen value of Ax^ , reducing that quantity by a factor 
of approximately one half. 

The question that remains to be answered is "What 
is the probability that ^13 is or is not zero?" The cor- 
rect answer to this question is that the probability that 
^13 = is zero; the probability it is not zero is one. No- 
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FIG. 5: [color online] The relationship of /^y^ to the con- 
fidence level. The solid (blue) curve is for normal statistics 
and the dashed (red) curve is calculated for the Ax'^ from the 
global analysis in Ref. ^ as depicted in Fig. [T] 
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FIG. 6: [color online] The relationship of Ax^ to the confi- 
dence level. The solid (blue) curve is for normal statistics and 
the dashed (red) curve is calculated for the Ax'^ for the T2K 
experiment Q] in Ref. 0] as depicted in Fig. [2] . 
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sin^ 2 6li3 


68.27 


1.00 


1.70 


0.70 


90.00 


2.71 


3.00 


1.88 


95.00 


3.84 


3.95 


2.78 


95.42 


4.00 


4.09 


2.93 


99.00 


6.63 


6.65 


5.40 


99.73 


9.00 


8.90 


7.55 



TABLE I: The relationship of confidence level to Ax^ for 
some commonly used confidence levels. Three examples are 
given: 1.) normal statistics, 2.) the Ax^ for 6^13 taken from a 
global analysis ^] and shown in Fig. [T] and 3.) the Ax^ for 
sin'^ 2^13 taken from Ref. 01 for the recent T2K data 2] and 
shown in Fig. [21 



tice that ^13 can be taken to lie between — 7r/2 and +-k/2 
and zero is a single point out of the continuum. Thus the 
question is an ill-posed one. The more meaningful ques- 
tion is "What is the maximum confidence level at which 
zero is not an allowed value?" Consider Example 1, for 
normal statistics, we find Ax^(O) = 2.0 at 6*13 = so 
that we might claim that the mixing angle is nonzero at 
a confidence level of 84%. Using the likelihood function 
for Example 1, we find ^13 is non-zero at the 72% confi- 
dence level, knowing that we are here using the language 
somewhat loosely. For Example 2, the likelihood function 
excludes ^13 = as an allowed value at the 92% confi- 
dence level. The Ax^ value at 6*13 = of 7.97 would give, 
using normal statistics, 99.5%. Why do we find this large 
over estimation? From Fig. [2] we see that below the min- 
imum Ax^ rises quite rapidly while above the minimum 
Ax^ rises slowly. This combination will always yield an 
over estimation of the confidence level extracted from a 
single point on the lower, rapidly rising curve. As the 
example case of T2K presented here is typical, present 
claims which calculate the confidence level from normal 
statistics will overestimate the confidence that zero is ex- 
cluded from the allowed region for ^13. 

In summary, we propose that confidence level and er- 
ror bars be calculated based on the understanding that 
the normalized likelihood function is a probability distri- 
bution function for whatever statistic is chosen to do the 
analysis. The confidence level is then given by an inte- 
gral over the normalized likelihood function, an implicit 
assumption in the marginalization procedure. We find 
that this alters the error bars we assign to parameters, 
and that in the case of the minimum being near to an end 
point of the independent variable, such as in the case of 
sin^2 6'i3, the change that this procedure makes can be 
particularly significant. Further, wc note that the ques- 
tion of what is the probability that ^13 is not zero is more 
carefully worded as what is the maximum confidence level 
at which the allowed region for ^13 does not include the 
value zero. Using this definition, published confidence 
levels for non-zero ^13 based on the normal statistics re- 
lationship of Ax^ to confidence level are found to be over 
estimations. 
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